\section{Conclusions and future work}
\label{sec-conclusions}

We have presented a technique that allows implementations of most of
the \commonlisp{} sequence functions that are simultaneously fast,
maintainable, and portable, provided the compiler supplied by the
implementation is sufficiently sophisticated to apply certain standard
optimization techniques.

The main exception for which our technique is generally unable to
compete with a native implementation is when the sequence is a bit
vector.  Any implementation that accesses the elements of the bit
vector one at a time, rather than using native instructions that can
handle an entire word at a time, is unable to match the native
performance \cite{Baker:1990:EIB:121989.121991}.  On the other hand,
our technique allows the \commonlisp{} implementation to treat bit
vectors as an exceptional case, and use our general technique for the
other cases.

We have yet to perfect the exact declarations to include in our
implementation, and the exact places where these declarations should
be added.  Different \commonlisp{} implementations have different
requirements in this respect, so this work may have to be repeated for
different implementations.

At the moment, we have been working exclusively with \sbcl{}, for the
simple reason that the \sbcl{} compiler does provide the optimizations
that are required in order for our technique to yield excellent
performance.  We intend to experiment with other major implementations
as well in order to determine which ones are suited for our
technique.  For our technique to provide fast code, the compiler must
be able to remove redundant tests.  A test $T_2$ is redundant if it is
dominated%
\footnote{Dominance is a graph-theory concept that is frequently used
  in optimizing compilers to transform intermediate code in the form
  of an instruction graph.}
by a test $T_1$ testing the exact same condition, and $T_2$ can be
reached from only one of the two branches of $T_1$.  In our technique,
an outer macro provides $T_1$, whereas $T_2$ occurs in the inner loop
of the function.

Other than requiring an optimizing compiler, our technique uses only
standard \commonlisp{} features in order to generate fast code.  Our
technique will work with systems without such a compiler, but no
improved performance will be observed.

The \cleavir{} compiler framework of the \sicl{} project will
ultimately include a technique for \emph{path replication} in
intermediate code, that, while not specifically meant for the kind of
optimization required for the technique presented in this paper, will
have the same effect as more direct techniques currently used in
advanced compilers.

Our technique is well adapted to processing sequences with a
relatively large number of elements.  When the sequence contains few
elements, the overhead of the call and of processing the keyword
arguments may be significant.  Also, we do not take advantage of any
declaration of element type, in the case when the sequence is a
vector.  We plan to investigate the possibility of modifying our
macros so that definitions of specialized functions are automatically
generated, leaving a fairly small general function that can then be
inlined.
